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Abstract 

A transmitter without channel state information (CSI) wishes to send a delay-limited Gaussian source 
over a slowly fading channel. The source is coded in superimposed layers, with each layer successively 
refining the description in the previous one. The receiver decodes the layers that are supported by the 
channel realization and reconstructs the source up to a distortion. The expected distortion is minimized 
by optimally allocating the transmit power among the source layers. For two source layers, the allocation 
is optimal when power is first assigned to the higher layer up to a power ceiling that depends only on 
the channel fading distribution; all remaining power, if any, is allocated to the lower layer. For convex 
distortion cost functions with convex constraints, the minimization is formulated as a convex optimization 
problem. In the limit of a continuum of infinite layers, the minimum expected distortion is given by the 
solution to a set of linear differential equations in terms of the density of the fading distribution. As the 
bandwidth ratio b (channel uses per source symbol) tends to zero, the power distribution that minimizes 
expected distortion converges to the one that maximizes expected capacity. While expected distortion can 
be improved by acquiring CSI at the transmitter (CSIT) or by increasing diversity from the realization of 
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independent fading paths, at high SNR the performance benefit from diversity exceeds that from CSIT, 
especially when b is large. 

Index Terms 

Broadcast channel coding, source coding, successive refinement, layer, superposition, optimal power 
allocation, distortion minimization, convex optimization. 

I. Introduction 

IN an ergodic wireless channel, from the source-channel separation theorem [1], it is optimal to first 
compress the source and incur the associated distortion at a rate equal to the channel capacity, then 
send the compressed representation over the channel at capacity with asymptotically small error. However, 
when delay constraints stipulate that the receiver decodes within a single realization of a slowly fading 
channel, without channel state information (CSI) at the transmitter, the transmission over a single fading 
block is non-ergodic and source-channel separation is not necessarily optimal. In this case it is possible to 
reduce the end-to-end distortion of the reconstructed source by jointly optimizing the source-coding rate 
and the transmit power allocation based on the characteristics of the source and the channel. In particular, 
we consider using the layered broadcast coding approach with successive refinement in the transmission 
of a Gaussian source over a slowly fading channel, in the absence of CSI at the transmitter. First we 
assume the channel has a finite number of discrete fading states, then we extend the results to continuous 
fading distributions, for example, Rayleigh fading with diversity from the realization of independent 
fading paths. The source is coded in layers, with each layer successively refining the description in the 
previous one. The transmitter simultaneously transmits the codewords of all layers to the receiver by 
superimposing them with an appropriate power allocation. The receiver successfully decodes the layers 
supported by the channel realization, and combines the descriptions in the decoded layers to reconstruct 
the source up to a distortion. In this paper, we are interested in minimizing the expected distortion, and 
more generally, a convex distortion cost function, of the reconstructed source by optimally allocating 
the transmit power among the layers of codewords. The system model is applicable to communication 
systems with real-time traffic where it is difficult for the transmitter to learn the channel condition. For 
example, in a satellite voice system, it is desirable to consider the efficient transmission of the voice 
streams over uncertain channels that minimize the end-to-end distortion. 

The broadcast strategy is proposed in [2] to characterize the set of achievable rates when the channel 
state is unknown at the transmitter. In the case of a Gaussian channel under Rayleigh fading, [3] describes 
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the layered broadcast coding approach, and derives the optimal power allocation that maximizes expected 
capacity when the channel has a single-antenna transmitter and receiver. The layered broadcast approach 
is extended to multiple-antenna channels and the corresponding achievable rates are presented in [4]. In 
[5], coding theorems are presented for the broadcast approach with delayed error-free feedback under 
decoding delay constraints. 

In the transmission of a Gaussian source over a Gaussian channel, uncoded transmission is optimal [6] 
in the special case when the source bandwidth equals the channel bandwidth [7]. For other bandwidth 
ratios, hybrid digital-analog joint source-channel transmission schemes are studied in [8]— [10]; in these 
works the codes are designed to be optimal at a target SNR but degrade gracefully should the realized 
SNR deviate from the target. In particular, [9] conjectures that no code is simultaneously optimal at 
different SNRs when the source and channel bandwidths are not equal. In this paper, the code considered 
is not targeted for a specific fading state; we minimize a convex distortion cost function over the fading 
distribution of the channel. 

In [11], the minimum distortion is investigated in the transmission of a source over two independently 
fading channels in terms of the distortion exponent, which is defined as the exponential decay rate of the 
expected distortion in the high SNR regime. Upper bounds on the distortion exponent and achievable joint 
source-channel schemes are presented in [12] for a single-antenna quasi-static Rayleigh fading channel, 
and later in [13], [14] for multiple-antenna channels. One of the proposed schemes in [13], layered source 
coding with progressive transmission (LS), is analyzed in terms of expected distortion for a finite number 
of layers at finite SNR in [15]. The results in [12], [13] show that the broadcast strategy with layered 
source coding under an appropriate power allocation scheme is optimal for multiple-input single-output 
(MISO) and single-input multiple-output (SIMO) systems in terms of the distortion exponent. Numerical 
optimization of the power allocation with constant rate among the layers is examined in [16], while [17] 
considers the optimization of power and rate allocation and presents approximate solutions in the high 
SNR regime. Motivated by the optimality of the broadcast strategy in the high SNR regime, in this work, 
first presented in [18], [19], we investigate minimizing a convex distortion cost function, and in particular, 
the expected distortion, at any arbitrary finite SNR. 

In a recent work in [20], the minimization of a linear distortion cost function, i.e., the expected 
distortion, is considered, and optimal power allocation algorithms are presented for discrete and continuous 
channel fading. In this paper, we study how the properties of the optimal power allocation are affected 
by the channel-source bandwidth ratio, operating SNR, channel quantization, diversity order, and the 
related metric of capacity maximization. Moreover, when the channel has discrete fading states, we also 
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consider the minimization of an arbitrary convex distortion cost function under convex constraints, by 
formulating the distortion minimization as a convex optimization problem. In minimizing the expected 
distortion, [20] presents algorithms that calculate the optimal rate vector and power allocation based on a 
linear distortion cost function; general convex cost functions and constraints on the distortion realizations 
are not considered. We show that the feasible distortion region is convex in layered broadcast coding 
with successive refinement, and the minimization of convex distortion cost functions can be efficiently 
solved with convex optimization numerical techniques. The minimization of a general convex distortion 
cost function under continuous channel fading distributions, however, remains an open research problem. 

The remainder of the paper is organized as follows. The system model is presented in Section II, 
and the layered broadcast coding scheme with successive refinement is explained in more detail in 
Section III. Section IV focuses on the optimal power allocation between two layers, with the analysis being 
extended in Section V to consider minimizing a convex distortion cost function over power allocation 
among multiple discrete layers. The optimal power allocation for discretized Rayleigh fading distributions 
are presented in Section VI. Minimizing expected distortion under continuous fading distributions are 
treated in Section VII by studying the limiting process as the channel discretization resolution increases. 
Section VIII considers the optimal power distribution and minimum expected distortion in Rayleigh fading 
channels with diversity, followed by conclusions in Section IX. 

II. System Model 

Consider the system model illustrated in Fig. 1 : A transmitter wishes to send a Gaussian source over a 
wireless channel to a receiver, at which the source is to be reconstructed up to a distortion. Let the source 
be denoted by s, which is a sequence of independent identically distributed (iid) zero-mean circularly 
symmetric complex Gaussian (ZMCSCG) random variables with unit variance: s£C~ CM(0, 1). The 
transmitter and the receiver each have a single antenna and the channel is described by 

y = Hx + n, (1) 

where x G C is the transmit signal, y G C is the received signal, and n G C ~ CM(0, 1) is iid 
unit-variance ZMCSCG noise. 

Suppose the distribution of the channel power gain is described by the probability density function 
(pdf) /(7), where 7 = \h\ 2 and h G C is a realization of H. We first consider fading distributions with 
a finite number of discrete fading states; subsequently we generalize to continuous fading distributions. 
The receiver has perfect CSI but the transmitter has only channel distribution information (CDI), i.e., the 
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Fig. 1. Source-channel coding without CSI at the transmitter. 



transmitter knows the pdf f(^y) but not its instantaneous realization. The channel is modeled by a quasi- 
static block fading process: H is realized iid at the onset of each fading block and remains unchanged 
over the block duration. We assume decoding at the receiver is delay -limited; namely, delay constraints 
preclude coding across fading blocks but dictate that the receiver decodes at the end of each block. Hence 
the transmission over a single fading block is non-ergodic. 

Suppose each fading block spans iV channel uses, over which the transmitter describes K of the source 
symbols. We define the bandwidth ratio as b = N/K, which relates the number of channel uses per 
source symbol. At the transmitter there is a power constraint on the transmit signal E[|x| 2 ] < P, where 
the expectation is taken over repeated channel uses over the duration of each fading block. We assume K 
is large enough to consider the source as ergodic, and iV is large enough to design codes that achieve the 
instantaneous channel capacity of a given fading state with negligible probability of error. At the receiver, 
the channel output y is used to reconstruct an estimate s of the source. The distortion D is measured by 
the mean squared error E[|s — s| 2 ] of the estimator, where the expectation is taken over the i^-sequence 
of source symbols and the noise distribution. The instantaneous distortion of the reconstruction depends 
on the fading realization of the channel; we are interested in minimizing the expected distortion E[D], 
where the expectation is over the fading distribution, and more generally, a convex distortion cost function 
with convex constraints in terms of the possible distortion realizations. 

III. Layered Broadcast Coding with Successive Refinement 

To characterize the set of achievable rates when the channel state is unknown at the transmitter, a 
broadcast strategy is described in [2]. The transmitter designs its codebook by imagining it is commu- 
nicating with an ensemble of virtual receivers. Each virtual receiver corresponds to a fading state: the 
realization of the fading state is taken as the channel gain of the virtual receiver. The realized rate at the 
original receiver is given by the decodable rate of the realized virtual receiver. A fading channel without 
transmitter CSI, therefore, can be modeled as a broadcast channel (BC). In particular, the capacity region 
of the BC defines the maximal set of achievable rates among the virtual receivers, which, in terms of 
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the original fading channel, is the maximal set of realized rates among the fading states. In this work, 
we derive the optimal operating point in the BC capacity region that minimizes a convex distortion cost 
function, and in particular, the expected distortion E[D], of the reconstructed source. 

For fading Gaussian channels, a layered broadcast coding approach is described in [3], [4]. In the 
layered broadcast approach, the virtual receivers are ordered according to their channel strengths: for 
single-antenna channels, the channel strength of a virtual receiver is given by the channel power gain of 
its corresponding fading state. We interpret each codeword intended for a virtual receiver as a layer of 
code, and the transmitter sends the superposition of all layers to the virtual receivers. The capacity region 
of a single-antenna Gaussian BC is achievable by successive decoding [21], in which each virtual receiver 
decodes, in addition to its own layer, all the layers below it (the ones with weaker channel strengths). 
Hence each layer represents the additional information over its lower layer that becomes decodable by 
the original receiver should the layer be realized. 

The layered broadcast approach fits particularly well with the successive refmability [22], [23] of 
a Gaussian source. Successive refmability states that if a source is first described at rate R\, then 
subsequently refined at rate R2, the overall distortion is the same as if the source were described at 
rate R\ + R2 in the first place. As the Gaussian source is successively refinable, naturally, each layer in 
the broadcast approach can be used to carry refinement information of a lower layer. Concatenation of 
broadcast channel coding with successive refinement source coding is shown in [12], [13] to be optimal 
in terms of the distortion exponent for MISO/SIMO systems. 

We apply the layered broadcast approach and successive refmability to perform source-channel coding 
as outlined in Fig. 2. First we assume the fading distribution has M non-zero discrete states: the channel 
power gain realization is ji > with probability pi, for i = 1, . . . , M; and we denote pq = Pr{7 = 0}. 
Accordingly there are M virtual receivers and the transmitter sends the sum of M layers of codewords. 
Let layer i denote the layer of codeword intended for virtual receiver i, and we order the layers as 
7m > • • • > 71 > 0. We refer to layer M as the highest layer and layer 1 as the lowest layer. Each 
layer successively refines the description of the source s from the layer below it, and the codewords 
in different layers are independent. Let Pi be the transmit power allocated to layer i, then the transmit 
symbol x can be written as 



where x\, . . . , xm are iid ZMCSCG random variables with unit variance. 

With successive decoding, each virtual receiver first decodes and cancels the lower layers before 
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Fig. 2. Layered broadcast coding with successive refinement. 



decoding its own layer; the undecodable higher layers are treated as noise. Thus the rate Ri (bits per 
channel use) intended for virtual receiver i is 

ft = log(l+ g§ -), i = l,...,M, (3) 

where log is to base 2, and the term 7, X^=«+i represents the interference power from the higher 
layers. Suppose 7^ is the realized channel power gain, then the original receiver can decode layer k and 
all the layers below it. Hence the realized rate at the original receiver is R± + ■ ■ ■ + 



From the rate distortion function of a complex Gaussian source [21], the mean squared distortion is 

,(*) 

rlz 



2 bR when the source is described at a rate of bR bits per symbol. Thus the realized distortion D^f} of 



the reconstructed source s is 

D (k) = 2 -bR ( £ = 2 -6(fli+-+fl*) j (4) 

where the last equality follows from successive refinability. The expected distortion E[D] is obtained by 
averaging over the probability mass function (pmf) of the fading distribution: 

M M 
i=0 i=0 

where = 1. 

We begin by deriving the optimal power allocation P* , . . . , among the layers to find the minimum 
expected distortion E[D]*. Subsequently we generalize to consider minimizing a convex distortion cost 
function. If a layer has an expected power gain of zero (i.e., piji = 0), the layer is allocated zero 
power; hence in the derivation we assume p^i 7^ 0, for i = 1, . . . , M. Note that the expected distortion 
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is monotonically decreasing in the transmit power P, hence the power constraint can be taken as an 
equality J2i=i Pi = P> an< l th e optimization formulated as: 

E[D\* = min E[D] 

Pi '-' Pm (6) 

subject to Pi > 0, Yl p i = p , = 1, . . . , M. 
We first consider the power allocation between two layers in the next section, then the analysis is extended 
to consider more than two layers in Section V. The layered source coding broadcast scheme can be 
straightforwardly extended to MISO/SIMO systems. The equivalent single-antenna channel distribution 
is found by using isotropic inputs at the transmitter for MISO systems, and performing maximal-ratio 
combining at the receiver for SIMO systems. 

IV. Two-Layer Optimal Power Allocation 

Suppose the channel fading distribution has only two states: the channel power gain realization is either 
a or (3, with j3 > a > 0. The transmitter then sends two layers (M = 2) of codewords as shown in 
Fig. 3. Let T\ denote the total transmit power constraint, and T 2 denote the power allocated to layer 2; 
the remaining power T\ — T 2 is allocated to layer 1. The decodable rates for the virtual receivers are 
denoted by Ri,R 2 ; with successive decoding, they are given as follows: 

R 2 = log(l + pr 2 ) (7) 

Suppose we generalize slightly and consider the weighted distortion: 

£>i = u2- bRl + w 2- b ^ +R i\ (9) 

where the weights {u, w } are non-negative. Note that the weighted distortion D\ is the expected distortion 
E[D] when the weights {u, w} are the probabilities of the fading realizations. 

Given T\, the total power available to the two layers, we optimize over T 2 to minimize the weighted 
distortion: 

Df = min D x (10) 

T 2 €[0,T!] 



' 1 + aT x y 

T 2 €[0,Ti] 

The minimization can be solved by the Lagrange method. We form the Lagrangian: 



mh V l] (TT^)"[ tt + (1 + /jra) ' 6w 



(ii) 



L(T 2 , Ai, A 2 ) = Di + Ai(T 2 - T\) - A 2 T 2 . (12) 
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Fig. 3. Power allocation between two layers. 



Applying the Karush-Kuhn-Tucker (KKT) necessary conditions, the gradient of the Lagrangian vanishes 
at the optimal power allocation T 2 . Specifically, the KKT conditions stipulate that at T 2 *, either one of 
the inequality constraints is active, or dD\/dT 2 = 0. Only one solution satisfies the KKT conditions, 
which leads to the optimal power allocation: 



T 2 *=min(tf 2 ,Ti) 



where 



U 2 





1 

IP 



u \a 



if U 2 < T x 
else, 

if 0/a < 1 + u/w 
! i else. 



(13a) 
(13b) 

(14) 



In Section V-D, we show that the distortion minimization can be posed as a convex optimization problem; 
hence the KKT conditions are necessary and sufficient for optimality (we assume the given total power 
is non-zero so Slater's condition holds). Interestingly, U 2 depends only on the layer parameters w, (3, u, a 
(which are derived from the channel fading distribution) and the bandwidth ratio b, but not on the total 
power T\. In other words, the higher layer is allocated a fixed amount of power as long as there is 
sufficient power available. The optimal power allocation, therefore, adopts a simple policy: first assign 
power to the higher layer up to a ceiling of U 2 , then assign all remaining power, if any, to the lower 
layer. 

Under optimal power allocation T 2 * , the minimum weighted distortion as a function of the total power 
T\ is given by 

' (l + aTi)- b Wi 

D* = < 

k u + {1 + PT^w 

where 



if U 2 < T x 
else, 



(15a) 
(15b) 



W x = {l + aU 2 ) b [u + {\ + pU 2 ) 



w 



(16) 
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Note that when the total power constraint T2 < T\ is not active (15a), the consequent minimum weighted 
distortion is analogous to that of a single layer with channel power gain a and an equivalent weight 
W\. On the other hand, when the total power constraint is active (15b), it is equivalent to one with 
channel power gain (3 and an equivalent weight w (with an additive constant u in the distortion). Hence 
under optimal power allocation, with respect to the minimum expected distortion, the two layers can be 
represented by a single aggregate layer; this idea is explored further when we consider multiple layers 
in Section V. 

The optimal power allocation and the minimum expected distortion for a channel that has two discrete 
fading states is shown in Fig. 4 and Fig. 5, respectively, where T 2 *, the power assigned to the top layer, 
is computed under the parameters: 

W=p 2 , P = l2, 

(17) 

u = 1 — P2, a = 1. 

Fig. 4 shows that when the channel power gain 72 is small, an increase in 72 leads to a larger power 
allocation T 2 * at the top layer, up to the total available power T\. However, as 72 further increases, T 2 * 
begins to fall. This is because the transmission power in the higher layer is in effect interference to the 
lower layer. When the top layer has a strong channel, the overall expected distortion is dominated by the 
bottom layer; in which case it is more beneficial to distribute the power to minimize the interference. 
Fig. 5 plots the two-layer minimum expected distortion on a logarithmic scale, and it shows that the 
relative power gain of the channels has only a marginal impact on the expected distortion, as the overall 
distortion is in general dominated by the weaker channel. 

V. Multiple-Layer Power Allocation 

In this section we consider the case when the fading distribution has M fading states as depicted in 
Fig. 2, where M is finite and M > 2. For notational convenience, we write the power assignment as a 
cumulative sum starting from the top layer: 

M 

Tj±J2 P i> for j = l,...,M. (18) 

i=j 

The original power assignments {Pi, . . . , Pm} can then be recovered from {T±, . . . , Tm} by taking their 
differences. By definition, T\ = P is given; hence the optimization is over the variables I2, . . . ,Tm' 



E[D}* = min E[D] 

t 2 ,...,t m (]9) 

subject to < T M < ■ ■ ■ < T 2 < P. 
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Fig. 5. Two-layer minimum expected distortion (p2 = 0.5). 



A. Expected Distortion Recurrence Relations 

In terms of the cumulative power variables T\, . . . ,Tm, the expected distortion in (5) can be written 

as 

1=1 J=l J J 
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where Tm+i = 0. We factor the sum of cumulative products in (20) and rewrite the expected distortion 
as a set of recurrence relations: 

D M = (1 + 7mT m )"V (21) 

a = ( rv+A+i), (22) 

where i runs from M — 1 down to 1. We refer to Di as the cumulative distortion, which represents the 
cumulative effects on the expected distortion from layers i and above, with D\ = E[D). Note that given 
Di + i from the previous recurrence step, the term Di depends on only two adjacent power allocation 
variables T; L and Tj+i; therefore, in each recurrence step i, we solve for the optimal T* +1 in terms of T^: 

D* M 4 D M (23) 

In the last recurrence step (i = 1), the minimum expected distortion E[D]* is then given by D\. 

B. Reduction through Optimal Power Allocation 

We consider the layers from top to bottom. In each recurrence step, the minimum distortion D* in (24) 
can be found by optimally allocating power between two adjacent layers as described in Section IV. In 
the first recurrence step (i = M — 1), we consider the power allocation between the topmost two layers. 
The minimal distortion D* M is found by setting the parameters in (11) to be: 

wm-i = Pm, Pm-i = 7m, 

(25) 

um-i=Pm-i, «m-i =7m-i, 

where the subscripts on the layer parameters w, (3, u, a designate the recurrence step. In general, in 

recurrence step i, the power allocation between layer i and layer i + 1 can be found by the optimization: 



D* = min 



u l + {l + l3 l T l+1 )- b w^ 



(26) 



1 + ajTj yb 
Ti+Teto,^] V 1 + otiT i+ i ) 
the solution of which is given in (15): 

' (1 + aiT^Wi if U l+1 < Ti (27a) 

D* = < 

K U i + {l + f3 l T l )~ b w l else. (27b) 
There are two cases to the solution of D*. In the first case, the power allocation is not constrained 

by the available power Tj, and we substitute (27a) in the recurrence relation (24) to find the minimum 

distortion in the next recurrence step % — 1: 



DU = t ^ fe^) 1 + (1 + "> 1)] ' ■ 
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The minimization in (28) has the same form as the one in (26), but with the following parameters: 

Wi-i = Wi, f3i-i = cti, 

(29) 

Ui-l = Pi-l, Oti-l = Ji-l. 

Hence the minimization can be solved the same way as in the last recurrence step. In the second case, 
the power allocation is constrained by the available power Tj, thus we instead substitute (27b) in (24) in 
the next recurrence step i — 1: 



Df i = min 



l + 7i _iT i _i\-6r 



Pi-! + Ui + (1 + (5Ti) V 



(30) 



_1 — 111111 I 

T.GlO.TVilV l + 7 i _ 1 T i 

which again has the same form as in (26), with the following parameters: 

Wi-i = Wi, (3i-i = (3i, 

(31) 

Ui-i = Pi-i + u i: a i - 1 =j i - 1 . 
Therefore, in each recurrence step, the two-layer optimization procedure described in Section IV can be 
used to find the minimum distortion and the optimal power allocation between the current layer and the 
aggregate higher layer. 

C. Feasibility of Unconstrained Minimizer 

When we proceed to the next recurrence step, however, it is necessary to determine which set of 
parameters in (29), (31) should be applied. Note that in the optimization in (26), if the available power 
Ti is unlimited (i.e., T = oo), then the optimal power allocation is T* +1 = Ui + i as given in (13a); hence 
Ui + \ is the unconstrained minimizer of Dj. Consequently, we can first assume the minimization in (26) 
is unconstrained by T and its solution is given by (27a). If the unconstrained allocation C/j+i is found to 
be feasible, then it is indeed the optimal allocation. On the other hand, if Ui + \ is subsequently shown to 
be infeasible, then we backtrack to the minimization in (26) and adopt the constrained solution given by 
(27b). In this case, T* +1 = T as given in (13b), which implies layer i is inactive since P* = T{—T* +l = 0. 

We ascertain the feasibility of C/j+i by verifying that it does not exceed the available power allocation T 
from the lower layer i, which in turn depends on the power allocation Tj_i from the next lower layer i — \ 
and so on. The procedure can be accomplished by the recursive algorithm shown in Algorithm 1 in the 
Appendix. We start by allocating power between the topmost two layers (line 1). In each recursion 
step, we compute the unconstrained allocation U (line 3). If U does not exceed the total power P, we 
first assume it is feasible, and proceed in the recursion to find the power allocation T* from the lower 
layer (line 10). If U turns out to be infeasible, then we repeat the allocation step with the constrained 
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minimization parameters (line 16). The recursion continues until the bottom layer is reached (line 4). In 
the best case, if the unconstrained allocations for all layers are feasible, the algorithm has complexity 
O(M). In the worst case, if all unconstrained allocations are infeasible, each recursion step performs two 
power allocations and the algorithm has complexity 0(2 M ). 

D. Convex Distortion Cost Function 

In the previous sections, we consider minimizing a linear objective function of the possible distortion 
realizations 's; in particular, we consider the expected distortion E[D] = Y^k=i Pk^^l- Analytical so- 
lutions are presented that characterize the optimal power allocation that minimizes the expected distortion. 
However, the expected distortion E[D] does not capture a user's sensitivity regarding the uncertainty in the 

(k) 

range of possible outcomes of the distortion realizations D^'s. In this section, we present a numerical 
optimization framework in which a wider class of objective functions is permissible. Specifically, we 
consider the minimization of a distortion cost function J(D^, . . . , D^), where J(-) is convex in 
D^,...,D^\ We show that the distortion minimization can be formulated as a convex problem; 
hence its solution can be computed efficiently by standard numerical methods in convex optimization 
[24]. 

(k) 

In terms of the cumulative power variables Tj's defined in (18), the distortion realization given 



in (4) can be written as: 



D«} = Y[(^m-\ k = l,...,M, (32) 



rlz 



A (k) (k— 1) 

where Tm+i = as previously defined. Note that in (32), can be written in terms of £r lz as 
follows: 

D {k) _ D (k-i)( l + lkT k yb 

where again recall = 1. Next, we rearrange (33) and write: 

/ 1 \ / D (j) \ ~ 1/b 1 

+ "-, i = l,...,M, (34) 

which we expand recursively initializing from j = 1 to arrive at an expression for the total power 
T i - YhLi p i> which is given by: 



M 



T^-^ + Ef 1 "^)^)^ (35) 
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where jm+i — oo. Under the power constraint Y^Li ^ — P> trie distortion cost function minimization 



problem can be formulated as: 



minimize 



J(D^,...,D^) (36) 



over D {1) D {M) 
over u rlz , u rlz 



subject to 

M 



- 1 +E( 1 - J -)(^)" 1/6 ^ 

71 ~L 7i H+1 J 

< D ( ^ ] < ■ ■ ■< dJ] < 1, (38) 

where (37) corresponds to the power constraint from (34), and (38) corresponds to the realized rates 
R^'s being nonnegative in (4). Note that the constraints (37), (38) are convex: in (37), (D^j is 
convex, and 1/7; — > 0, which follows from the system model assumptions > ■ • • > 71 > 0. 

Therefore, the feasible distortion region {D$,. . . , ) }, as characterized by (37)-(38), is convex. 
The objective function J . . . , D^p) in (36) is convex by assumption. It follows that minimizing 
J(D^, . . . , D^p} over a convex region is a convex optimization problem: it can be efficiently solved 
by standard convex optimization numerical methods. For instance, the optimization problem above can 
be solved using the CVX software package [25], [26]. 

For example, to characterize the user's sensitivity to the uncertainty in the realized distortion, we may 
consider a risk-sensitive distortion cost function: 

J V {D^,. . .,DM) 4 E[D] + ipVAR[D], (39) 

where VAR[D] denotes the distortion variance: 

v2n 



VAR[L>] = E[(D- E [£>]) ] (40) 

M M 
k=0 i=0 

Note that J^(-) is a convex function of D^,...,D^\ In (39), (p > is a given scalar constant, 
which represents the risk-aversion parameter [24]. Accordingly, we may specify a suitable value of ip 
to model the user's willingness to trade off an increase in the expected distortion E[D] in return for a 
reduction in the distortion variance VAR[D]. Under the convex optimization formulation in (36)-(38), 
we may additionally consider convex constraints on D^, . . . , For example, to guarantee worst- 

case performance under unfavorable fading states, we may consider the following maximum distortion 
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or variance constraints: 

E[D] < D max (42) 
YAR[D] < y max (43) 
D$}<D£l, k = l,...,M, (44) 

(k) 

where D max , V max , and Dmax's are given constants. In Section VI, numerical examples are presented 
where we minimize the risk-sensitive distortion cost function J^(-) under discretized Rayleigh fading. 

VI. Discretized Rayleigh Fading Distribution 

In this section, we present numerical results produced by the multiple-layer power allocation algorithms 
described in Section V. In the examples, we assume the channel pmf is taken from a discretized Rayleigh 
fading distribution. Specifically, for a channel under Rayleigh fading with unit power, the channel power 
gain 7 is exponentially distributed with unit mean, and its probability density function (pdf) is given by 

/( 7 ) = e-T, for 7 >0. (45) 

We truncate the pdf at 7 = V, quantize 7 into M evenly spaced levels: 

7i = % T/M, for % = 1, . . . , M, (46) 

with 70 = 0, and discretize the probability distribution of 7 to the closest lower level 7$: 

f (i+l)T/M 

Pi= / f{l)dn, for i = 0,...,M-l (47) 

JiT/M 
/•oo 

A 



pm = y r /(7)<*7- ( 48 ) 

While it is possible to consider the optimal discretization of a fading distribution that minimizes expected 
distortion [16], [17], [27], in this paper we assume the channel pmf is given and do not consider such a 
step. 

The optimal power allocation that minimizes the expected distortion E[D] for the discretized Rayleigh 
fading pmf is shown in Fig. 6 and Fig. 7. The Rayleigh fading pdf is truncated at T = 2, and discretized 
into M = 24 levels. The truncation is justified by the observation that in the output the highest layers 
near F are not assigned any power. Fig. 6 plots the optimal power allocation P*'s for different layers 
(indexed by the channel power gain 7 j) at SNRs P = dB, 5 dB, and 10 dB, with the bandwidth ratio 
6=1. We observe that the highest layers are inactive (P* = 0), and within the range of active layers 
a lower layer is in general allocated more power than a higher layer, except at the lowest active layer 
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Fig. 6. Optimal power allocation that minimizes the expected distortion E[D] (b = 1). 
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Fig. 7. Optimal power allocation that minimizes the expected distortion E[D] (P = dB). 



where it is assigned the remaining power. As SNR increases, the power allocations of the higher layers 
are unaltered, but the range of active layers extends further into the lower layers. On the other hand, 
Fig. 7 plots the allocation P*'s for different bandwidth ratios b = 0.5, 1, 2 at the SNR of dB. It can 
be observed that a higher b (i.e., more channel uses per source symbol) has the effect of spreading the 
power allocation further across into the lower layers. 

Intuitively, the higher layers have stronger channels but suffer from larger risks of being in outage, 
while the lower layers provide higher reliability but at the expense of having to cope with less power- 
efficient channels. Accordingly the optimal power allocation is concentrated around the middle layers. 
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Fig. 8. Minimum expected distortion under optimal power allocation. 

Furthermore, as SNR increases, the numerical results suggest that, to minimize the expected distortion 
in a Rayleigh fading channel, it is more favorable to utilize the weaker channels with the extra power 
rather than accepting the larger outage risks from the higher layers. 

The minimum expected distortion E[D]* under optimal power allocation is shown in Fig. 8 on a 
logarithmic scale. When the bandwidth ratio b is higher, E[D]* decreases as expected. However, the 
improvement in E[D]* from refining the resolution M of the discretization is almost negligible at low 
SNRs. At high SNRs, on the other hand, the distortion is dominated by the outage probability: 



which is decreasing in M. Therefore, when the SNR is sufficiently high, the expected distortion E[D]* 
reaches a floor that is dictated by P ou t- This behavior is due to having evenly-spaced 7«'s; the performance 
could be improved by optimizing the quantization level 7j's for the given M layers. 

As a comparison, we consider the expected distortion lower bounds when the system has CSI at the 
transmitter (CSIT). Under the discretized Rayleigh fading pmf, suppose the realized channel power gain 
is known to be 7^, then it is optimal for the transmitter to concentrate all power on layer k to achieve the 
instantaneous distortion -D q _csiT = (1 + 7kP)~ b - Thus with the quantized CSIT, the expected distortion 



-Pout — Pr{7o = is realized} 

,-T/M 



(49) 



= / /(7)<*7, 
J 



(50) 
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Fig. 9. Expected distortion lower bounds with CSIT (b = 0.5). 



is given by 

M 

E[-D q -csrr] 

fc=0 

In terms of the original Rayleigh fading pdf /(j), with perfect CSIT, the expected distortion is similarly 
given by 

poo 

E[ZW]=/ e-'(l + 1 P)- b d 1 , (52) 
J o 

where the definite integral can be evaluated numerically. The expected distortions are plotted in Fig. 9 
for the cases of no CSIT, quantized and perfect CSIT. It can be observed that at low SNRs, quantized 
CSIT is nearly as good as perfect CSIT, whereas at high SNRs, quantized CSIT provides only marginal 
improvement over no CSIT, as the expected distortion is dominated by the probability of outage. 

Numerical examples that illustrate the minimization of a convex cost function of the possible distortion 

(k) 

realizations D^''s are shown in Fig. 10 and Fig. 11. We consider the risk-sensitive distortion cost function: 
J V {D^,. . . , D^f } ) = E[D] + <pVAR[D]. Fig. 10 shows the optimal power allocation that minimizes 
J v (-) for different values of the the risk-aversion parameter (p. It is observed that a large (p, which 
represents the user's aversion to large variations in the realized distortion, shifts and concentrates the 
power allocation towards the lower layers. Fig. 11 plots the corresponding expected distortion E[D] and 
distortion variance VAR[D] that minimizes J ¥ ,(-). As 92 increases, it shows the tradeoff of accepting a 
higher E[D] for the reduction in VAR[D]. 
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Fig. 10. Optimal power allocation that minimizes E[D] + <^VAR[D] (P = dB, b = 0.5). 
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Fig. 11. Expected distortion and variance corresponding to min{E[D] + <£VAR[D]} (P = dB, b = 0.5). 

VII. Continuous Fading Distribution 

In this section, we consider continuous fading distributions, and we focus on the minimization of a 
linear distortion cost function, the expected distortion, by extending on the optimal power allocation 
analytical solutions derived in Section V-B. We study the limiting process as the discretization resolution 
of the fading distribution tends to infinity, and consider the optimal power distribution that minimizes 
the expected distortion when the fading distribution of the channel is given by a continuous probability 
density function. Specifically, we assume the layers are evenly spaced, with 7^+1 — 7i = A7, and we 
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consider the limiting process as A7 — > to obtain the power distribution: 

P{l) - A^O^T/A^, (53) 

where for discrete layers the power allocation Pi is referenced by the integer layer index i, while the 
continuous power distribution ^(7) is indexed by the channel power gain 7. 

Consider the optimal power allocation between layer 7 and its next lower layer 7 — A7. Let T(j — A7) 
denote the available transmit power for layers 7 — A7 and above, of which T{^) is allocated to layers 7 
and above; the remaining power T(-y — A7) — T(j) is allocated to layer 7 — A7. The optimal power 
allocation T* (7) is given by the solution to the two-layer optimization problem in Section IV, with the 
parameters in Fig. 3 correspondingly set to be: 

w = W(j), P = l, 

(54) 

u = /(t) a 7, a = 7 - A7, 

where f(j) is the pdf of the channel power gain with f^Aj representing the probability that layer 7 — 
A7 is realized, and W(j) is interpreted as an equivalent probability weight summarizing the aggregate 
effect of the layers 7 and above. 

From (13), (14), the optimal power allocation is given by 

if 17(7) < Tin - A 7) ( 55a ) 

else, (55b) 

if 7 > W(7)//(7) + A 7 (56a) 
W ^ 1 ) else. (56b) 




T*( 7 ) 

where 

7 V L /(7)(7 - A7). 

We assume there is a region of 7 where the cumulative power allocation is not constrained by the power 
available from the lower layers, i.e., U(j) < U{~f — A7) and U{^) < P. In this region the optimal power 
allocation T*(j) is given by the unconstrained minimizer U (7) in (55a). In the solution to U(j) we need 
to verify that U(j) is non-increasing in this region, which corresponds to the power distribution p*(j) 
being non-negative. Following (15a), we write the cumulative distortion from layers 7 and above in the 
form: 

D*( 7 ) = (1 + 7T( 7 ))-V(7). (57) 

Substitute in the unconstrained cumulative power allocation ^(7), the cumulative distortion at layer 
7 — A7 becomes: 

- at) = £±hz^mzM ) - [ /(7) a 7 + (1 + ium - V (7) ] 



(58) 
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which is of the form in (57) if we define — A7) by the recurrence equation: 

W( 7 - A 7 ) = (1 + (7 - A 7 )[/( 7 )) b [/( 7 ) A 7 + (1 + 7 t/( 7 )) " V( 7 )] • 



(59) 



As the spacing between the layers condenses, in the limit of A 7 approaching zero, the recurrence 
equations (58), (59) become differential equations. The optimal power distribution p*( 7 ) is given by the 
derivative of the cumulative power allocation: 



P*(7) = -T*'( 7 ), 

where T*( 7 ) is described by solutions in three regions: 

( 7 > 7o 



T*( 7 ) 



^(7) 7P < 7 < 7o 



(60) 

(61a) 
(61b) 



\P 7 < IP- (61c) 

In region (61a) when 7 > 7o , corresponding to cases (55a) and (56a), no power is allocated to the layers 

and (59) simplifies to W{^) = \ — F{^), where ^(7) = Jq f(s) ds is the cumulative distribution function 

(cdf) of the channel power gain. The boundary 70 is defined by the condition in (56a) which satisfies: 

lofilo) + F{ lo ) -1 = 0. (62) 

Under Rayleigh fading when 7(7) = 7~ 1 e _7// ^, where 7 is the expected channel power gain, (62) 
evaluates to 7 D = 7. For other fading distributions, j may be computed numerically. 

In region (61b) when 7p < 7 < 7o , corresponding to cases (55a) and (56b), the optimal power 
distribution is described by a set of differential equations. We apply the first order binomial expansion 
(1 + A 7 ) b =■ 1 + 6A7, and (59) becomes: 

win) - wfr - a 7 ) 



Win) 



lim 



A7 



7 L \ 7 / 



i + b 



(63) 
(64) 

v i + 6 n -'T (65) 

Hence ^(7) is described by a first order linear differential equation. With the initial condition U (70) = 0, 
its solution is given by 



which we substitute in (56b) to obtain: 



u( 7 ) + 1/7 



17( 7 ) = 



f 

Jin 



7 1/2 , f(s) 



f(s) 



[s 2 f(s)} ds 



(l + 6)[ 7 2 /(7)]~ 



(66) 
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and condition (55b) in the lowest active layer becomes the boundary condition U(jp) = P. In [20], the 
power distribution in (66) is derived using the calculus of variations method. 
Similarly, as A7 — > 0, the evolution of the expected distortion in (58) becomes: 

b 7 U'(j) 



D\l) 



1 + 7 £/( 7 ) 
b (2 /'( 7 )\ 



D( 7 )-f( 7 ), 



-l + b\ 7 f{n) J 

which is again a first order linear differential equation. With the initial condition D(j ) = W{^ ) 
7o /( 7o ), its solution is given by 

-jf>[G) ,/W 



(67) 
(68) 



lo> f(lo) 



-b 
1+6 



ds + 7o/(7o) 



(69) 



>7o' /(7o)- 

Under Rayleigh fading, for instance, the solutions to U(n) and D(j) are given by 



£>( 7 ) = 



1 r 

7 



(1 + 6) [ 7 2 e-T/^] *+ 
2 



-s/T 



-(s-7)/7 



(is + e 



-1 



(70) 



(71) 



i+b 



The integrals in (70), (71) can be computed numerically by evaluating the incomplete gamma function. 

Finally, in region (61c) when 7 < 7 p, corresponding to case (55b), the transmit power P has been 
exhausted, and no power is allocated to the remaining layers. Hence the minimum expected distortion is 

E[D]* = D(0) = F( 7P ) + £>( 7P ), (72) 

where the last equality follows from when 7 < 7p in region (61c), p*( 7 ) = and -D( 7 ) = J^ P /(s) cZs + 
£>( 7P ). 

VIII. Rayleigh Fading with Diversity 

In this section we consider the optimal power distribution and the minimum expected distortion when 
the wireless channel undergoes Rayleigh fading with a diversity order of L from the realization of 
independent fading paths. Specifically, we assume the fading channel is characterized by the Erlang 
distribution: 

(L/7)V 



/l(7) 



)L^L-l e -L-y/-y 



7>0, 



(73) 
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Fig. 12. Span of active layers under optimal power distribution. 



which corresponds to the average of L iid channel power gains, each under Rayleigh fading with an 
expected value of 7. The L-diversity system may be realized by having L transmit antennas using 
isotropic inputs, by relaxing the decode delay constraint over L fading blocks, or by having L receive 
antennas under maximal-ratio combining when the power gain of each antenna is normalized by 1/L. 

The optimal power distribution (61) concentrates the transmit power over a range of active layers; the 
upper and lower boundaries j , 7p of the span of the active layers are plotted in Fig. 12. A higher SNR 
P or a larger bandwidth ratio b extends the span further into the lower layers but the upper boundary 
7o remains unperturbed. As L increases, the fluctuation in the channel realization is diminished by the 
diversity of combining multiple independent fading paths, and the power distribution becomes more 
concentrated, albeit slowly. By the law of large numbers, at asymptotically large L, we expect all power 
concentrates at 7. 

Fig. 13 shows the optimal power allocation p*(~f). It can be observed that a smaller bandwidth ratio b 
reduces the spread of the power distribution. In fact, as b approaches zero, the optimal power distribution 
that minimizes expected distortion converges to the power distribution that maximizes expected capacity. 
To show the connection, we take the limit in the distortion-minimizing cumulative power distribution in 
(66): 

6^0 7/(7) 

which is equal to the capacity-maximizing cumulative power distribution as derived in [4]. Essentially, 
from the first order expansion e b = 1 + b for small b, E[D] = 1 — bE[C] when the bandwidth ratio is 
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Fig. 13. Optimal power distribution (P — dB). 



small, where E[C] is the expected capacity in nats/s, and hence minimizing expected distortion becomes 
equivalent to maximizing expected capacity. For comparison, the capacity-maximizing power distribution 
is also plotted in Fig. 13. Note that the distortion-minimizing power distribution is more conservative, and 
it is more so as b increases, as the allocation favors lower layers in contrast to the capacity-maximizing 
power distribution. 

Fig. 14 shows the minimum expected distortion E[D]* versus SNR for different diversity orders. With 
infinite diversity, the channel power gain becomes constant at 7, and the distortion is given by 

(1 + tP)- 6 - 



D\ 



L=oo 



(75) 



In the case when there is no diversity (L = 1), a lower bound to the expected distortion is also plotted. 
The lower bound assumes the system has CSI at the transmitter (CSIT), which allows the transmitter to 
concentrate all power at the realized layer to achieve the expected distortion: 



r 

E[#csit] = / 
J 



(76) 



Note that at high SNR, the performance benefit from diversity exceeds that from CSIT, especially when 
the bandwidth ratio b is large. In particular, in terms of the distortion exponent A [11], it is shown in 
[13] that in a MISO or SIMO channel, layered broadcast coding achieves: 

logE[£>] 



A = — lim 



P^oo log P 



= min(6, L), 



(77) 
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Fig. 14. Minimum expected distortion (b = 2). 



where L is the total diversity order from independent fading blocks and antennas. Moreover, the layered 
broadcast coding distortion exponent is shown to be optimal and CSIT does not improve A, whereas 
diversity increases A up to a maximum as limited by the bandwidth ratio b. 

IX. Conclusions 

We considered the problem of source-channel coding over a delay-limited fading channel without CSI 
at the transmitter, and derived the optimal power allocation that minimizes the end-to-end distortion 
in the layered broadcast coding transmission scheme with successive refinement. In the allocation of 
transmission power between two layers of codewords in a two-state fading channel, the optimal allocation 
that minimizes the expected distortion has a particular structure that lends itself to be generalized to the 
cases when the channel has multiple discrete fading states or a continuous fading distribution. Specifically, 
the optimal two-layer allocation assigns power first to the higher layer, up to a power ceiling that depends 
only on the channel fading distribution but independent of the total available power; any surplus over the 
power ceiling is allocated to the lower layer. When the channel has multiple discrete fading states, we 
write the minimum expected distortion as a set of recurrence relations, and in each recurrence step the 
two-layer optimization procedure solves the power allocation between the current layer and the aggregate 
higher layer. The optimization framework is extended to consider convex distortion cost functions with 
convex constraints by posing the minimization as a convex optimization problem. We applied the power 
allocation algorithms to the pmf of a discretized Rayleigh fading distribution. We observed that the 
optimal power allocation is concentrated around the middle layers, and within this range the lower layers 



June 17, 2009 



DRAFT 



27 



are assigned more power than the higher ones. As the SNR increases, the allocations of the higher layers 
remain unchanged, and the extra power is allocated to the idle lower layers. The distortion-minimizing 
power distribution, therefore, is conservative: it is more beneficial to utilize the lower layers, despite their 
weaker channel gains, than the higher layers as the latter have larger risks of being in outage. 

We also derived the optimal power distribution that minimizes the expected distortion when the fading 
distribution of the channel is given by a continuous probability density function. We computed the optimal 
power distribution for Rayleigh fading channels with diversity order L, and showed that increasing the 
diversity L concentrates the power distribution towards the expected channel power gain 7, while a larger 
bandwidth ratio b spreads the power distribution further into the lower layers. On the other hand, in the 
limit as b tends to zero, the optimal power distribution that minimizes expected distortion converges to 
the power distribution that maximizes expected capacity. While the expected distortion can be improved 
by acquiring CSIT or increasing the diversity order, it is shown that at high SNR the performance benefit 
from diversity exceeds that from CSIT, especially when the bandwidth ratio b is large. Under continuous 
channel fading, in this paper we focused on minimizing the expected distortion, which is a linear cost 
function of the distortion realizations. Future research works may include considering the minimization 
of a general convex distortion cost function under continuous channel fading distributions. 
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Appendix 



Algorithm 1 Multiple-Layer Power Allocation 

1: ALLOC(M — 1,pm,Jm,Pm-i,Jm-i) > Start from top 

2: procedure ALLOC(i, w, (3, u, a) 



3: Compute U from w, (3, u, a 

4: if i = 1 then > Bottom layer 

5: T 2 * <- min(/J, P) 

6: return 

7: end if 

8: if U < P then > Within total power P 

9: Compute W from £7, w, (3, u, a 

10: ALLOC(i — 1, W, a,pj_i,7j_i) > Unconstrained 

11: if 77 > [/ then 

12: «- U >U is feasible 

13: return 

14: end if 

15: end if 

16: ALLOC(i — l,w,/3,pi-i + u,ji-i) > Constrained 

17: 77+1 - T* 



18: end procedure 
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